Skip to main content

Use Kyuubi JDBC to Access Lakesoul's Table

tip

Available since version 2.4.

LakeSoul implements Flink/Spark Connector.We could use Spark/Flink SQL queries towards Lakesoul by using kyuubi.

Requirements

ComponentsVersion
Kyuubi1.8
Spark3.3
Flink1.17
LakeSoul2.5.3
Java1.8

The operating environment is Linux, and Spark, Flink, and Kyuubi have been installed. It is recommended to use Hadoop Yarn to run the Kyuubi Engine. Also, you could start local spark/flink cluster.

Deploy Kyuubi engines on Yarn.

1. Dependencies

Download LakeSoul Flink Jar from: https://github.com/lakesoul-io/LakeSoul/releases/download/v2.5.3/lakesoul-flink-1.17-2.5.3.jar

And put the jar file under $FLINK_HOME/lib.

2. Configurations

Please set the PG parameters related to Lakesoul according to this document: Setup Metadata Database Connection for Flink

After this, you could start flink session cluster or application as usual.

3. LakeSoul Operations

Use Kyuubi beeline to connect Flink SQL Engine:

$KYUUBI_HOME/bin/beeline -u 'jdbc:hive2://localhost:10009/default;user=admin;?kyuubi.engine.type=FLINK_SQL'

Flink SQL Access Lakesoul :

create catalog lakesoul with('type'='lakesoul');
use catalog lakesoul;
use `default`;

create table if not exists test_lakesoul_table_v1 (`id` INT, name STRING, score INT,`date` STRING,region STRING, PRIMARY KEY (`id`,`name`) NOT ENFORCED ) PARTITIONED BY (`region`,`date`) WITH ( 'connector'='lakeSoul', 'use_cdc'='true','format'='lakesoul', 'path'='hdfs:///lakesoul-test-bucket/default/test_lakesoul_table_v1/', 'hashBucketNum'='4');

insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (1,'AAA', 100, '2023-05-11', 'China');
insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (2,'BBB', 100, '2023-05-11', 'China');
insert into `lakesoul`.`default`.test_lakesoul_table_v1 values (3,'AAA', 98, '2023-05-10', 'China');

select * from `lakesoul`.`default`.test_lakesoul_table_v1 limit 1;

drop table `lakesoul`.`default`.test_lakesoul_table_v1;

You could replace the location schema from hdfs:// to file:// .

More details about Flink SQL with LakeSoul refer to : Flink Lakesoul Connector

Spark SQL Access Lakesoul's Table

1. Dependencies

Download LakeSoul Spark Jar from: https://github.com/lakesoul-io/LakeSoul/releases/download/v2.5.3/lakesoul-spark-3.3-2.5.3.jar

And put the jar file under $SPARK_HOME/jars.

2. Configurations

  1. Please set the PG parameters related to Lakesoul according to this document: Setup Metadata Database Connection for Spark

  2. Add spark conf to $SPARK_CONF_DIR/spark-defaults.conf

    spark.sql.extensions=com.dmetasoul.lakesoul.sql.LakeSoulSparkSessionExtension

    spark.sql.catalog.lakesoul=org.apache.spark.sql.lakesoul.catalog.LakeSoulCatalog

    spark.sql.defaultCatalog=lakesoul

    spark.sql.caseSensitive=false
    spark.sql.legacy.parquet.nanosAsLong=false

3. LakeSoul Operations

Use Kyuubi beeline to connect Spark SQL Engine:

$KYUUBI_HOME/bin/beeline -u 'jdbc:hive2://localhost:10009/default;user=admin;?kyuubi.engine.type=SPARK_SQL'

Spark SQL Access Lakesoul :

use default;

create table if not exists test_lakesoul_table_v2 (id INT, name STRING, score INT, date STRING,region STRING) USING lakesoul PARTITIONED BY (region,date) LOCATION 'hdfs:///lakesoul-test-bucket/default/test_lakesoul_table_v2/';

insert into test_lakesoul_table_v2 values (1,'AAA', 100, '2023-05-11', 'China');
insert into test_lakesoul_table_v2 values (2,'BBB', 100, '2023-05-11', 'China');
insert into test_lakesoul_table_v2 values (3,'AAA', 98, '2023-05-10', 'China');

select * from test_lakesoul_table_v2 limit 1;

drop table test_lakesoul_table_v2;

You could replace the location schema from hdfs:// to file:// .

More details about Spark SQL with LakeSoul refer to : Operate LakeSoulTable by Spark SQL